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Consider a random sample in the max-domain of attraction of 
a multivariate extreme value distribution such that the dependence 
structure of the attractor belongs to a parametric model. A new es- 
timator for the unknown parameter is defined as the value that min- 
imizes the distance between a vector of weighted integrals of the tail 
dependence function and their empirical counterparts. The minimiza- 
tion problem has, with probability tending to one, a unique, global 
solution. The estimator is consistent and asymptotically normal. The 
spectral measures of the tail dependence models to which the method 
applies can be discrete or continuous. Examples demonstrate the ap- 
plicability and the performance of the method. 

1. Introduction. Statistics of multivariate extremes finds important ap- 
plications in fields like finance, insurance, environmental sciences, aviation 
safety, hydrology and meteorology. When considering multivariate extreme 
events, the estimation of the tail dependence structure is the key part of 
the statistical inference. This tail dependence structure, represented by the 
stable tail dependence function I, becomes rather complex if the dimension 
increases. Therefore, it is customary to model this multivariate function I 
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parametrically, which leads to a semiparametric model. The interest in para- 
metric tail dependence models has existed since the early sixties of the 20th 
century [Gumbel (I960)], but new models are still being proposed [Boldi and 
Davison (2007), Cooley, Davis and Naveau (2010), Ballani and Schlather 
(2011)]. Most of the existing estimators of the parameter, 9, are likelihood- 
based and their asymptotic behavior is only known in dimension two [Coles 
and Tawn (1991), Joe, Smith and Weissman (1992), Smith (1994), Ledford 
and Tawn (1996), de Haan, Neves and Peng (2008), Guillotte, Perron and 
Segers (2011)]. For many applications, the bivariate setup is too restrictive. 
Also, the likelihood-based estimation methods exclude models that entail 
a nondifferentiable function /, like the widely used factor models; see (1.1) 
below. 

It is the goal of this paper to present and provide a comprehensive treat- 
ment of novel M-estimators of 6. The estimators can be used in arbitrary 
dimension d. Moreover, not relying on the differentiability of I, the estimators 
are broadly applicable. We establish, again for arbitrary dimension d, the 
asymptotic normality of our estimators, which yields asymptotic confidence 
regions and tests for the parameter 0. The results in this paper make sta- 
tistical inference possible for many multivariate extreme value models that 
either cannot be handled at all by currently available methods or for which 
statistical theory has only been provided for the bivariate case. Monte Carlo 
simulation studies confirm that our estimators perform well in practice; see 
Sections 5 and 6. 

The present estimators are a major extension of the method of moments 
estimators for dimension two [Einmahl, Krajina and Segers (2008)]. For ap- 
plications, the crucial difference is that it is now possible to handle truly 
multivariate data. Also, theoretically, extreme value analysis in dimensions 
larger than two is quite challenging, which explains why in many papers 
attention is restricted to the bivariate case. In particular, we establish the 
asymptotic behavior of the nonparametric estimator of I in arbitrary dimen- 
sions and under nonrestrictive smoothness conditions; compare, for instance, 
with Drees and Huang (1998) in the bivariate case. Another novel aspect is 
that the method of moments technique is replaced by general M-estimation, 
that is, allowing for more estimating equations than the dimension of the 
parameter space. This more flexible procedure may serve to increase the 
efficiency of the estimator. 

The absence of smoothness assumptions on / makes it possible to estimate 
the tail dependence structure of factor models like X = (Xi, . . . ,Xd), with 

r 

(1.1) Xj = ^aijZi + ej, j = l,...,d, 

1=1 

consisting of the following ingredients: nonnegative factor loadings aij and 
independent, heavy-tailed random variables Zi called factors; independent 
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random variables £j whose tails are lighter than the ones of the factors 
and which are independent of them. This kind of factor model is often 
used in finance, for example, in modeling market or credit risk [Fama and 
French (1993), Malevergne and Sornette (2004), Geluk, de Haan and de Vries 
(2007)]. From equation (6.3) below, we see that the stable tail dependence 
function I of such a factor model is not everywhere differentiable, causing 
likelihood-based methods to break down. 

The organization of the paper is as follows. The basics of the tail de- 
pendence structures in multivariate models are presented in Section 2. The 
M-estimator is defined in Section 3. Section 4 contains the main theoretical 
results: consistency and asymptotic normality of the M-estimator, and some 
consequences of the asymptotic normality result that can be used for con- 
struction of confidence regions and for testing. This section also contains the 
asymptotic normality result for In Section 5 we apply the M-estimator 
to the well-known logistic stable tail dependence function (5.1). The tail 
dependence structure of factor models is studied in Section 6. Both mod- 
els are illustrated with simulated and real data. The proofs are deferred to 
Section 7. 

2. Tail dependence. We will write points in as x = {xi, . . . ,Xd) and 
random vectors as Xi = {Xn, . . . , Xi^), for i = 1, . . . , n. Let Xi,. . . , X^ be 
independent random vectors in with common continuous distribution 
function F and marginal distribution functions Fi, . . . , F^,. For j = 1, . . . ,d, 
write Mn '■= maxj=i^...^„ Xjj. We say that F is in the max-domain of at- 
traction of an extreme value distribution G if there exist sequences On > 0, 
bn^ e M, j = 1, . . . , d, such that 



for all continuity points x G M of G. The margins Gi, . . . , Gd of G must be 
univariate extreme value distributions and the dependence structure of G is 
determined by the relation 



for all points x such that Gj{xj) > for all j = 1, . . . , (i. The stable tail 
dependence function I : [0,oo)'^ — t- [0,oo) can be retrieved from F via 

(2.2) l{x) = Umt~^P{l - Fi{Xu) < ixi or ... or 1 - Fd{Xid) < txd}. 

In fact, the joint convergence in (2.1) is equivalent to convergence of the d 
marginal distributions together with (2.2). 



(2.1) 




log G{x) = /(- log Gi (xi), . . . , - log Gd{xd)) 
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In this paper we will only assume the weaker relation (2.2). By itself, (2.2) 
holds if and only if the random vector (1/{1 — Fi{Xij)})j^i belongs to 
the max-domain of attraction of the extreme value distribution Gq^x) = 
exp{— /(l/xi, . . . , 1/xd)} for x€ (0,00)"^. Alternatively, the existence of the 
limit in (2.2) is equivalent to multivariate regular variation of the random 
vector (1/{1 - Fi(Xij)})^^i on the cone [0, 00]*^ \ {(0, . . . , 0)} with limit 
measure or exponent measure ^ given by 

Hiiz £ [O,oof:zi > xi or ... or Zd > Xd}) = /(l/xi, . . . , l/xd) 

[Resnick (1987), Beirlant et al. (2004), de Haan and Ferreira (2006)]. The 
measure fi is homogeneous, that is, fJ-itA) = t~^ii{A), for any t>0 and any 
relatively compact Borel set A C [0, oo]'^\{(0, . . . , 0)}, where tA := {tz : z A}. 
This homogeneity property yields a decomposition of /x into a radial and an 
angular part [de Haan and Resnick (1977), Resnick (1987)]. Let Ad-i := 
{w G [0,lY :wi + ■ ■ ■ + Wd = 1} be the unit simplex in M"^. Associated to 
B C Ad-i and t > is the set 

{d d ^ 

xG[0,oo)'^\{(0,...,0)}:^Xj>t,x/^x,Gi?l. 
j=i j=i J 

By the homogeneity property of the exponent measure, it holds that n{Bt) = 
t-V(-Bi)- Writing H{B) = fi{Bi) defines a finite measure H on A^^i, called 
the spectral or angular measure. Any finite measure satisfying the moment 
conditions 



(2.3) / WjH{dw) = 1, 



j = l,...,d, 



is a spectral measure. Adding up the d constraints in (2.3) shows that H/d 
is a probability measure. 

Sometimes it is more convenient to work with the measure A obtained 
from fi after the transformation (xi, . . . ,Xd) ^ (1/xi, . . . , l/x^). The meas- 
ure A is also called the exponent measure and it satisfies the homogeneity 
property A{tA) = tA{A), for any t>0 and Borel set A C [0, oo]''\{(oo, . . . ,00)}. 

There is a one-to-one correspondence between the stable tail dependence 
function /, the exponent measures and A, and the spectral measure H. In 
particular, we have 

(2.4) l{x) = n{{{zi, ...,Zd)£ [0, oo]'^ : zi > 1/xi or ... or > l/x^}) 

(2.5) = A{{{ui,. . . ,Ud) G [0,oo]'^:ni < xi or ... or Ud<Xd}) 



(2.6) = / max {wjXj}H{dw). 



From the above representations and the moment constraints (2.3), it fol- 
lows that the function / has the following properties: 
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• max{xi, . . . ,Xd} < l{x) < xi + ■ ■ ■ + Xd for all x £ [0,oo)'^; in particular, 
l{z, 0, . . . , 0) = • • • = l{0, . . . , 0, z) = z for all z > 0; 

• I is convex; and 

• I is homogeneous of order one: l{txi, . . . , txd) = tl{xi, . . . , Xd), for all t > 
and all x G [0, oo^. 

The function / is connected to the function V in Coles and Tawn (1991) 
through l{x) = V{l/xi, . . . , l/xd) for x £ (0,oo)^. 

The right-hand partial derivatives of I always exist; indeed, by bounded 
convergence it follows that for j = 1, . . . , d, as h], 0, 

- (/(xi , . . . , Xj„x , Xj + h, Xjj^\ , . . . , X(i) /(xi , . . . , Xj_i , Xj , Xj-)-i , . . . , X(if) 



— max< WjXj,ma'K{wsXs} \ ]H{di 
WjXj > m.ax{wsXs} \H{dw). 



Similarly, the left-hand partial derivatives exist for all x G (0,00)"^. By con- 
vexity, the function / is almost everywhere continuously differentiable, with 
its gradient vector of (the right-hand) partial derivatives as in (2.7). 

3. Estimation. Let Rj denote the rank of Xij among Xij , . . . , Xnj , i = 
1,. . . ,n, j = 1, . . . ,d. For k G {1, . . . , n}, define a nonparametric estimator 
of I by 

f„(x) = lk,n{x) 

(3.1) 

1 f 1 1 1 

I X] 1 > + 2 ~ ■ ■ ■ -^i^ > ^ + 2 ~ ^^'^ f ' 

1=1 ^ ^ 

see Huang (1992) and Drees and Huang (1998) for the bivariate case. This 
definition follows from (2.2), with all the distribution functions replaced by 
their empirical counterparts, and with t replaced hy k/n. Here /c = A;„ is such 
that /c — )• 00 and k/n ^ as n — )• 00. The constant 1/2 in the argument of 
the indicator function helps to improve the finite-sample properties of the 
estimator. 

In the literature, the stable tail dependence function is often modeled 
parametrically. We impose that the stable tail dependence function I belongs 
to some parametric family {l(-;9) :9 G G}, where C M^, p>l. Note that 
this is still a large, flexible model since there is no restriction on the marginal 
distributions and the copula is constrained only through /; see (2.2). 
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We propose an M-estimator of 9. Let q>p. Let g = {gi,. . . ,gq)'^ : [0, 1]'' — 
M*^ be a column vector of integrable functions such that : — t- M'' defined 

by 

(3.2) <p{0)-= [ gix)l{x;e)dx 

J [0,1]'* 

is a homeomorphism between G and its image (p{Q). Let denote the true 
parameter value. The IvI-estimator 6^ of Oq is defined as a minimizer of the 
criterion function 

ip{9)- I gin =y^A I gm{x){ln{x) -l{x;e))dx 

[0,1]'* 



(3.3) Qk,niO) 



m=l 



where || • || is the Euclidean norm. In other words, if 1^ = argminj^g^(0) \\y — 

J gln\\ , then On S ip~^ O^n)- Later we show that On is, with probability tending 
to one, unique. 

The fact that our model assumption only concerns a limit relation in 
the tail shows up in the estimation procedure through the choice of k, which 
determines the effective sample size. When we study asymptotic properties of 
either In or On, k = kn is an intermediate sequence, that is, /c — )■ oo and k/n — t- 
as n — )• oo. In practice, the choice of optimal A; is a notorious problem, and 
here we address this issue in the usual way: we present the finite sample 
results over a wide range of k; see Sections 5 and 6. 

Remark 3.1. The estimator On depends on g. In line with the classical 
method of moments and for computational feasibility, we will choose g to be 
a vector of low degree polynomials. In Sections 5 and 6 we will see that the 
obtained estimators have a good performance and a wide applicability. Find- 
ing an optimal g is very difficult and statistically not very useful since such 
a g depends on the true, unknown ^o- For example, when p = q = l, a func- 
tion g that minimizes the asymptotic variance is {d/dO)l(x;OQ). For two- 
dimensional and five-dimensional data, a sensitivity analysis on the choice 
of g is performed in Section 5. Simple functions like 1 or xi lead to esti- 
mators that perform approximately the same as the pseudo-estimator based 
on the optimal g. This supports our choices of g and also suggests that the 
estimator is not so sensitive to the choice of g. 

Remark 3.2. Since /, part of the model, is parametrically specified, 
in principle, pseudo maximum likelihood estimation could be used. This 
method, however, does not apply to many interesting models where I is not 
differentiable, like the factor model in (1.1). Ivloreover, no theory is known 
for dimensions higher than 2, unless the limit relation (2.2) is replaced by an 
equality for all sufficiently small t. In this paper, the emphasis is on higher 
dimensions and for a large part on the factor model. Therefore, the pseudo 
MLE is not an available competitor. 
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4. Asymptotic results. Let Qn be the set of minimizers of Qk,n in (3-3), 
that is, 

Qn '■= argmm 

6IG0 

Note that 0^ may be empty or may contain more than one element. We 
show that under suitable conditions, a minimizer exists, that it is unique 
with probability tending to one, and that it is a consistent and asymptoti- 
cally normal estimator of 9q. In addition, we show that the nonparametric 
estimator in (3.1) is asymptotically normal. 

4.1. Notation. Recall the definition of the measure A from Section 2. 
Let Wa be a mean-zero Wiener process indexed by Borel sets of [0, cof \ 
{(oo, . . . , oo)} with "time" A: its covariance structure is given by 

(4.1) nWK{Ai)WA{A2)\=k{Air\A2) 

for any two Borel sets Ai and A2 in [0, 00]'^ \ {(00, . . . , 00)}. Define 

(4.2) t^;(x):=t^A({'uG[0,oo]'^\{(oo,...,oo)}:'Ui<xi or ... or < x^}). 
Let Wij,j = 1, . . . ,d, be the marginal processes 

(4.3) Wij{xj):=Wi{0,...,0,Xj,0,...,0), Xj>0. 

Define Ij to be the right-hand partial derivative of / with respect to xj, where 
j = 1, . . . ,d [see (2.7)]; if / is differentiable, Ij is equal to the corresponding 
partial derivative of /. Write 

d „ 

(4.4) B{x):=Wi{x)-y2l,{x)Wij{xj), B := g{x)B{x)dx. 

The distribution of B is zero-mean Gaussian with covariance matrix 

(4.5) ^ ■■= [ [ E[Bix)B{y)]g{x)g{yf dxdy GR'^'"^. 

J Jmirr 

Note that if / is parametric, S depends on the parameter, that is, S = S(^). 

Assuming 6 is an interior point of @ and ip is differentiable in 6, let 
If (6) G be the total derivative of if at 6, and, provided ip{6) is of full 
rank, put 

(4.6) Mi9) := {m^mr'm^mmmfmr' e k^'x^. 

4.2. Results. We state the asymptotic results for the M-estimator, 9n, 
and the asymptotic normality of /„. The latter is a result of independent 
interest, and requires continuous partial derivatives of I, which is not an 
assumption for the asymptotic normality of the M-estimator. The proofs 
can be found in Section 7. 
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Theorem 4.1 (Existence, uniqueness and consistency of On)- Let g: 
[0, 1]^ M« be integrable. 

(i) If if is a homeomorphism from Q to ip{@) and if there exists Sq > 
such that the set {6 £ @ : \\9 — 6o\\ < Eq} is closed, then for every e such that 
eo>£>0, as oo, 

P(e„^0 and Qn C {e e e -.p - OoW < e}) ^ I. 

(ii) // in addition to the assumptions of (i) , is in the interior of the 
parameter space, (p is twice continuously differentiahle and <P{0q) is of full 
rank, then, with probability tending to one, Qk,n (3.3) has a unique min- 
imizer On ■ Hence, 

p 

Qn Go as n ^ OO. 

In part (i) of this theorem we assume that the set {0G0:||0 — ^oll^^} 
is closed for some e > 0. This is a generahzation of the usual assumption 
that G is open or closed, and includes a wider range of possible parameter 
spaces. 

Theorem 4.2 (Asymptotic normality of On). If in addition to the as- 
sumptions of Theorem ^.i(ii), the following two conditions hold: 

(CI) t~'F{l-Fi{Xii)<txi or ... orl-Fd{Xid)<txd}-lix) = Oit^), 
uniformly in x £ A^i-i as t\,0, for some a> 0, 

(C2) k = o(n?"^^^'^'^"^) , for the positive number a of (CI), and /c — t- oo 
as oo, 

then as n— t-oo, with M as in (4-6), 

(4.7) Vk{0n-0o)^N{0,M{0o)). 

The following consequence of Theorem 4.2 can be used for the construc- 
tion of confidence regions. Recall from (2.6) that Hg is the spectral measure 
corresponding to l{-;0). Let Xu denote the ^^-distribution with degrees of 
freedom. 

Corollary 4.3. // in addition to the conditions of Theorem ^.2, the 
map 6 ^ Hg is weakly continuous at Oq and if the matrix M{0q) is nonsin- 
gular, then as n — )• oo, 

(4.8) k{On - Oo)^M{On)~\On " ^o) ^ Xp- 

Let l<r <p and = (01,62) G G C M^, where 0i G M^^'', O2 G M''. We 
want to test O2 = O2 against 02^ 62, where 0| corresponds to a submodel. 
Denote On = {Oin,02n), and let M2{0) be the r x r matrix corresponding to 
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the lower right corner of Af, as below: 



(4.9) M: 



Mo 



BPXP 



Corollary 4.4 (Test). If the assumptions of Corollary 4-3 are satisfied, 
and 9q = (61,62) G for some 61, then as 00, 

(4.10) k{62n - e;fM2{6in,e*2)-\62n " 6*2) 4 xl 

The above result can be used for testing for a submodel. For example, we 
could test for the symmetric logistic model of (5.3) within the asymmetric 
logistic one; see Section 5. 

Remark 4.5. The matrices M and M2 are needed for the computa- 
tion of the confidence regions and the test statistics. However, computing 
these matrices can be challenging. To compute M, we first need the q x p 
matrix (p{d), whose {i,j)th. element is given by / gi{x){d/d6j)l{x;d)dx. The 
expression itself will depend on the model in use, but usually the (right-hand) 
partial derivatives of / can be computed explicitly, whereas the integral is 
to be computed numerically in most cases. Second, we need to calculate the 
covariance of the process B. We see from (4.5) that the most difficult part 
will be the expression K[B{x)B{y)]. It holds that 

d 

E[B{x)B{y)]=E[Wi{x)Wi{y)] -Y,hiy)nWi{x)Wij{yj)] 



Y,Ux)K[W^i^,^ix^)Wi{y)] 



1=1 

d d 



+ Y,Y.^^ix)lJ{y)nWi^){x^)WlJ{yJ)]. 
i=l j=l 

Using (4.1), (4.2), (4.3) and the relation between A and I, we can express 
this in I and its partial derivatives. Numerical integration is then performed 
to obtain S. 

Finally, we show the asymptotic normality of /„. This result is of inde- 
pendent interest and can be found in the literature for d = 2 only and under 
stronger smoothness conditions on /; see Huang (1992), Drees and Huang 
(1998) and de Haan and Ferreira (2006). Here, a large part of its proof is 
necessary for the proof of the asymptotic normality of dn, but we wish to 
emphasize that the asymptotic normality of 6n holds without any differentia- 
bility conditions on I. Note that under assumption (C3) below, the process B 
in (4.4) is continuous, although Ij may be discontinuous at points x such 
that Xj =0. 
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The result is stated in an approximation setting, where /„ and B are 
defined on the same probabihty space obtained by a Skorohod construction. 
The random quantities involved are only in distribution equal to the original 
ones, but for convenience this is not expressed in the notation. 

Theorem 4.6 (Asymptotic normality of in arbitrary dimensions). If 
in addition to the conditions (CI) and (C2) from Theorem the following 
condition holds: 

(C3) for all j = 1, . . . ,d, the first-order partial derivative of I with respect 
to Xj exists and is continuous on the set of points x such that Xj > 0, 

then for every T > 0, as ?i —)• oo, 



5. Example 1: Logistic model. The multivariate logistic distribution func- 
tion with standard Frechet margins is defined by 



\j=i J ) 

for xi > 0, . . . , > and Q € [0, 1], with the proper limit interpretation for 
^ = 0. The corresponding stable tail dependence function is given by 



Introduced in Gumbel (1960), it is one of the oldest parametric models of 
tail dependence. 

Sensitivity analysis. Here we observe how for the logistic model the M- 
estimator changes with different choices of k, and for different functions g. 
Within this model, p = 1 and in the simple case oi p = q = 1, it is easy 
to see that the optimal choice for the function g is {d/d9)l{x;9o). Since it 
depends on the unknown true parameter, this is not a viable option for use 
in practice, but, as demonstrated below, some simple alternatives result in 
estimators with basically the same finite-sample behavior. 

The following analysis is performed for the logistic model with = 0.5, in 
dimensions 2 and 5. For both settings, we look at 200 replications of samples 
of size n = 1500, and take the threshold parameters k G {40, 80, . . . , 320}. In 
the bivariate case we compare 5o(a;i,3;2) = 1, 5i(xi,X2) = xi and gopt{xi, 
2^2) = {d/d9)l{xi,X2',0Q) as choices for g. In the five-dimensional case the 
functions go and ^opt are defined analogously, and we compare them to 
two other functions, gi{x) = Ylj=iXj and g2{x) = J2j=ix'j- We use the bias 
and the Root Mean Squared Error (RMSE) to assess the performance of 
the estimators. The results are presented in Figure 1 for dimensions d = 2 



(4.11) 



sup \Vk{lnix)-l{x))- B{x)\^0. 



x€lO,T] 




(5.1) 



l{xi, ...,xd;0) = {xY^ H h^y^) 
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logistic model, d=2, n=1500, 9o=0.5 logistic model, d=2, n=1500, 60=0.5 




opt. g 



I 50 100 150 200 250 300 50 100 150 200 250 300 

k k 

Fig. 1. Logistic model: the M-estimator for different functions g in dimension d — 2 
(top) andd = 5 (bottom). 

(top) and d = 5 (bottom). All of the above choices for g result in similar 
finite-sample behavior of the estimator, but the simpler function g leads to 
a somewhat better performance. The RMSEs for some of these g are even 
lower than the one for i^opti since they yield a smaller bias. 

Based on these findings, for the logistic model in dimensions 2 and 5, we 
advise the use of the simplest choice of g given by go{x) = 1, for all x > 0. 
The choice of k is slightly more delicate, but it seems that for n = 1500 in 
dimensions 2 and 5, the choices of A; = 150 and k = 100, respectively, are 
reasonable. 

Comparison with maximum likelihood based estimators. For d = 2, we 
also compare the M-estimator with g = l with the censored maximum like- 
lihood method [see Ledford and Tawn (1996)] and with the maximum like- 
lihood estimator introduced in de Haan, Neves and Peng (2008). The latter 
two we will call the censored MLE and the dHNP MLE, respectively. For 
200 samples, we compute the censored MLE using the function fitbvgpd 
from the R package POT [see Ribatet (2011)]; the dHNP MLE is calculated 
as described in the original article. Since the thresholds used in these two 
methods differ, and since for a different choice of threshold we get a dif- 
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logistic model, d=2, n=1500, 6o=0.5, M-estimator logistic modei, d=2, n=1500, 9o=0.5, M-estimator 




50 100 150 200 250 300 
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iogistic modei, d=2, n=1500, eo=0.5, dHNP MLE 
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iogistic model, d=2, n=1500, 6o=0.5, censored MLB 



o 
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o 
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o 
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k 

logistic model, d=2, n=1500, 60=0.5, dHNP MLE 




o 

O - 

d ' 1 1 1 1 1 1 
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k 

logistic modei, d=2, n=1500, 8o=0.5, censored MLE 




o 
d 



1 1 1 1 1 — o ' 1 1 1 1 1 — 

20 40 60 80 100 20 40 60 80 100 

ttireshold threshold 

Fig. 2. The M-estimator with g{x,y) — go{x,y) = 1, the MLE from de Haan, Neves and 
Peng (2008) and the censored MLE, d = 2. 

ferent estimator, the comparison is not straightforward. We consider the 
M-estimator and the dHNP MLE over the range of k values as used above, 
and for the censored MLE we take the thresholds such that the expected 
number of joint exceedances is between 10 and 160, approximately, which 
amounts to thresholds between 5 and 100. This way we observe all estima- 
tors for their best region of thresholds. In Figure 2 we see that the methods 
perform roughly the same, the RMSEs being of the same order. The low- 
est RMSE of the censored MLE (0.030) is slightly smaller than the lowest 
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60=0.5 



60=0.5 




60=0.5, l„.5(1,1,1,1,1)=V5 



100 150 200 250 



(b) 

e„=o.5, i„.5(i,i,i,i,i)=V5 



- M-estimator 

- 1„(1,1,1,1,1) 



150 200 250 300 



- M-estimator 
- 1„(1,1:1,1.1) 

150 200 250 300 



(c) 



(d) 



Fig. 3. Logistic model, d = 5, 60 ~ 0.5, l{l, 1, 1, 1, l;^o) = V5. (a) Bias of the M-estimator 
of 9; (b) RMSE of the M-estimator of 6; (c) bias of the estimators 0/ /(1, 1, 1, 1, 1; 0.5); 
(d) RMSE of the estimators 0/ /(1, 1, 1, 1, 1; 0.5) . 



RMSE of the M-estimator (0.034) and the lowest RMSE of the dHNP esti- 
mator (0.035), but the M- and the dHNP estimators are much more robust 
to the choice of the threshold. 



Further simulation results. We simulate 500 samples of size n = 1500 
from a five-dimensional logistic distribution function with = 0.5. As sug- 
gested by the sensitivity analysis, we opt for g = l when defining 6n- The bias 
and the RMSE of this estimator are shown in the upper panels of Figure 3. 

Also, we consider the estimation of /(1, 1, 1, 1, 1; 0), based on this M- 
estimator 6n- From (5.1) it follows that /(1, 1, 1, 1, 1; ^) = 5^. The estimator 

of this quantity is then 5^". Since = 0.5, the true parameter is y/b. We 
compare the bias and the RMSE of this estimator and of the nonparamet- 
ric estimator /„(!, 1, 1, 1, 1); see (3.1). The lower panels in Figure 3 show 
that the M-estimator performs better than the nonparametric estimator for 
almost every k. 
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Scatter plot of the Loss-ALAE data Plot of the ranks the Loss-ALAE data 




500000 1000000 1500000 2000000 500 1000 1500 



LOSS Loss 

Fig. 4. The insurance claims Loss-ALAE data. 

Real data: Testing and estimation. We use the bivariate Loss-ALAE data 
set, consisting of 1500 insurance claims, comprising losses and allocated loss 
adjustment expenses; for more information, see Frees and Valdez (1998). 
The scatterplots of the data and their joint ranks are shown in Figure 4. We 
consider the asymmetric logistic model described below for their tail depen- 
dence function and we test whether a more restrictive, symmetric logistic 
model suffices to describe the tail dependence of these data. The asymmet- 
ric logistic tail dependence function was introduced in Tawn (1988) as an 
extension of the logistic model. In dimension d = 2 it is given by 

(5.2) l{x, y; 9, V^i, ^2) = (1 - Vi)^ + (1 - ^2)y + {{i^ixf'' + {i'2yf''f 

with the dependence parameter ^ E [0,1] and the asymmetry parameters 
V'i)V'2 S [0,1]. This model yields a spectral measure H with atoms at (1,0) 
and (0, 1) whenever ijji <1 and 1I12 < 1- When -01 = -02 =: i^, we have the 
symmetric tail dependence function 

(5.3) l{x, 2/; V') = (1 - ^){x + y) + i^{x^" + y"'f. 

For the given data, we test whether the use of this symmetric model is 
justified, as opposed to the wider asymmetric logistic model. Setting r]i := 
(V'l -I- V'2)/2 G [0,1] and ?/2 := (V'l — '^2)/'^ G [~l/2,l/2], we reparametrize 
the model in (5.2) so that testing for symmetry amounts to testing whether 
r/2 = 0. By Corollary 4.4, the test statistic is given by 

Table 1 below shows the obtained values of Sn for the Loss-ALAE data for 
selected values of k. 
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Table 1 

Values of the test statistic S„ for the Loss-ALAE data for selected values of k 



k 50 100 150 200 250 

S„ 0.041 0.139 0.294 0.477 0.681 



Since the critical value is 3.84, the null hypothesis is clearly not rejected. 
Hence, we adopt the symmetric tail dependence model (5.3) and we compute 
the M-estimates of {0,r]i) = (Ojip), the auxiliary functions being gi{x,y) = x 
and g2{x,y) = 2{x + y). For k = 150, we obtain {6,tjj) = (0.65,0.95) with 
estimated standard errors 0.032 for 6 and 0.014 for ip. 

6. Example 2: Factor model. Consider the r-factor model, r € N, in di- 
mension d: X' = {X[, . . . , X'^) and 

r 

(6.1) X'j=Y,aijZ, + ej, jG{l,...,4, 

1=1 

with Zi independent Frechet(z/) random variables, u > 0, with Sj indepen- 
dent random variables which have a lighter right tail than the factors and 
are independent of them, and with a^j nonnegative constants such that 
Qij > for all i. Factor models of this type are common in various ap- 
plications; for example, in finance, see Fama and French (1993), Malevergne 
and Sornette (2004), Geluk, de Haan and de Vries (2007). However, for the 
purpose of studying the tail properties, it is more convenient to consider the 
(max) factor model: X = {Xi, . . . , X^) and 

(6.2) Xj= max {aijZi}, j£{l,...,d}, 

x=l,...,r 

with aij and Zi as above. Note that X' and X have the same tail depen- 
dence function /; this essentially follows from the fact that the ratio of the 
probabilities of the sum and the maximum of the aijZi exceeding x tends 
to 1 as x — )• oo [Embrechts, Kliippelberg and Mikosch (1997), page 38]. Let 
Wi = Z^ , i = 1, . . . ,r, and observe that the Wi are standard Frechet random 
variables. Define a d-dimensional random vector Y = {Yi, . . . , Y^) by 

y. := X^ = , max {a'^^W^}, j G {1, . . . , d}. 

■' i=l,...,r ■' 

It is easily seen that, as x — )• oo. 



1 - Fy. (x) = 1 - exp 

Since the Xj variables are increasing transformations of the 1^- variables, 
the (tail) dependence structures of X and Y coincide. We will determine the 
tail dependence function / and the spectral measure H oi X. 
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Lemma 6.1. Let X follow a factor model given by (6.1) or (6.2). Then 
its stable tail dependence function is given by 

r 

(6.3) /(xi, . . . ,x<i) = max {bijXj}, (xi, . . . , x^) G [0, oo)'', 

^ j=l,...,d 



where hj := <•/ XlLi 



Next, we are looking for a measure H on the unit simplex A^-i = {w G 
[0, oo)*^ : tfi H \-Wd=l} such that for all x G [0, oo)'^, 

y max {bijXj} = l{xi, . . . ,Xd) = / max {wjXj}H{dw). 

This H is a discrete measure with r atoms given by 

the atom receiving mass bij, which is positive by assumption. Such mea- 
sure H is indeed a spectral measure, for 

(6.5) f wjH{dw) = ^bij = l, jG{l,...,d}. 

Every discrete spectral measure can arise in this way. This model for tail 
dependence is considered also in Ledford and Tawn (1998). Extensions to 
random fields are considered, for instance, in Wang and Stoev (2011). 

The spectral measure is completely determined by the r x d parame- 
ters bij, but by the d moment conditions from (6.5), the actual number of 
parameters is p = (r — l)d. The parameter vector 9 £W, which is to be es- 
timated, can be constructed in many ways. For identification purposes, the 
definition of 6 should be unambiguous. We opt for the following approach. 
Consider the matrix of the coefficients bij, 

^hi ■■■bri\ 

: •.. : gM'^^^ 
\bid ■■■ brd) 

The coefficients corresponding to the ith. factor, i = 1, . . . ,r, are in the ith 
column of this matrix. We define by stacking the above columns in de- 
creasing order of their sums, leaving out the column with the lowest sum. 
(If two columns have the same sum, we order them then in decreasing order 
lexicographically. ) 

The definition of the M-estimator of 9 involves integrals of the form 

/ gm{x)l{x) dx = / gm{x) max {bijXj]dx, 
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where gm '■ [0, l]*^ — )■ M is integrable and m = 1, . . . ,q. A possible choice is 
9m{x) = where A; G {1, . . . , d} and s > 0. 

Lemma 6.2. If I is the tail dependence function of a factor model such 
that all bij > 0, then 

/ x|/(x)dx 

(6.6) 

= yy- — ^i— ^ /Yk^AiVfrf^xAiV^, 

where 6jk is 1 if j = k and if j ^ k. 

We illustrate the performance of the M-estimator on two factor models: 
a four-dimensional model with 2 factors {p= 1 x 4 = 4) , for simulated data 
sets, and a three-dimensional model with 3 factors (p = 2 x 3 = 6), for real 
financial data. 

The integral on the right-hand side of (6.6) is to be computed numeri- 
cally. For the factor model, the dependence of the matrix M{6q) on g is too 
complicated to obtain a general solution for the optimal function g. Since in 
the previous examples low degree polynomials gave very good results, and 
since by the previous lemma such a choice simplifies the calculations signif- 
icantly (numerical integration in dimension 1, instead of in dimension d), 
we considered such functions g m. a, sensitivity analysis. It showed that the 
simplest cases give very good results in terms of root mean squared errors 
and that the performance of the M-estimator is quite robust to the partic- 
ular choices of g. Hence, we suggest using simple, low degree polynomials 
for the functions g. The functions g in the following examples are exactly of 
that type. 

Simulation study: Four- dimensional model with two factors. We simu- 
lated 500 samples of size n = 5000 from a four-dimensional model: 

Xi = 0.2Zi VO.8Z2, 
X2 = 0.5Zi VO.5Z2, 
X3 = 0.7Zi VO.3Z2, 
X4 = 0.9Zi VO.IZ2 

with independent standard Frechet factors Zi and Z2. We have 6 = (0.2, 0.5, 
0.7,0.9). 

In Figure 5 we show the bias and the RMSE of the M-estimator based on 
g = 5 moment equations, with auxiliary functions gi{x) = Xi, for i = 1, 2, 3, 4 
and (75 = 1. The M-estimator performs very well. For relatively small k, the 
four components of 9 are estimated equally well, whereas for larger k the 
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01=0.2, 62=0.5 03=0.7, 64=0.9 




200 400 600 800 1000 200 400 600 800 1000 

k k 



Fig. 5. Four-dimensional 2-factor model, estimation o/ 6 = (0.2, 0.5, 0.7, 0.9). 

estimator performs somewhat better for parameter values in the "middle" 
of the interval (0, 1) than for values near or 1. 

Real data: Three-dimensional model with three factors. We consider 
monthly negative returns (losses) of three industry portfolios (Telecommu- 
nications, Finance and Oil) over the period July 1, 1926, until December 31, 
2009. See Figure 6(a) for the scatterplot of the data; the sample size n = 
1002. The data are available at http://inba.tuck.dartinouth.edu/pages/ 
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faculty/ken. french. We are interested in modeling the losses by a fac- 
tor model. In the asset pricing literature [see, e.g., Fama and French (1993, 
1996)], it is common to model the returns by linear factor models of type (6.1), 
with three underlying economic factors. Based on that line of literature, we 
also consider a three-factor model for the tails of the three industry portfolios 
above; see also Kleibergen (2011). 

To estimate the parameter vector with p = 2 x 3 = 6 components, we 
need to find a minimum of a 6-dimensional nonlinear criterion function. To 
solve such a difficult minimization problem, it is important to have good 
starting values. We find a starting parameter vector by applying the 3- 
means clustering algorithm [see, e.g.. Pollard (1984), page 9] to the following 
pseudo-data: we transform the data (Telcm, Fin, Oil) to 

(n/(n + 1- RTi),n/{n + 1- RFi),n/{n + 1- Roi)), i=l,...,n, 

where Rxi, Rpi and Roi are the ranks of the components of the ith. obser- 
vation. Only the entries such that the sum of their values is greater than 
the threshold n/75 are taken into account, and subsequently normalized so 
that they belong to the unit simplex A3_i; see Figure 6(b). We compute the 
3- means cluster centers for these data. Using equation (6.4), we compute 
from these three centers the 6-dimensional starting parameter [as described 
below equation (6.5)] for the minimization routine. For the criterion function 
we use q = 7 functions gi as follows: gi{x) = Xi for i = 1,2,3, gi{x) = x?_3 
for i = 4, 5, 6, and gj = 1. For different choices of k, we obtain the estimates 
presented in Table 2. For each /c, we estimate the loading of the first two 
factors. This corresponds to the first two columns of estimated hij for each k. 
The third columns follow from the conditions in (6.5). 

Observe that the estimates do hardly depend on the choice of k. We see 
that all three portfolios load substantially on the first factor (the first column 
of estimated coefficients, for each k), but Telecommunications loads more 
on the second factor (the first lines of estimated coefficients), and Oil more 



Table 2 

Estimates for the factor loadings bij m the three-factor model fitted to the tail of the 

Telcm/Fin/Oil data 





fc = 60 






fc = 90 




0.394 


0.593 


0.013 


0.344 


0.616 


0.040 


0.691 


0.211 


0.098 


0.701 


0.216 


0.083 


0.358 


0.062 


0.580 


0.368 


0.052 


0.580 




k = 120 






fc = 150 




0.387 


0.586 


0.027 


0.388 


0.581 


0.031 


0.695 


0.215 


0.090 


0.699 


0.211 


0.090 


0.348 


0.058 


0.594 


0.364 


0.086 


0.550 
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on the third factor (the third lines of estimated coefficients). This indicates 
that even for only these three portfolios, three factors are required. 

7. Proofs. The asymptotic properties of the nonparametric estimator 
are required for the proofs of the asymptotic properties of the M-estimator 
Consistency of In [see (7.1)] for dimension d = 2 was shown in Huang (1992); 
cf. Drees and Huang (1998). In particular, it holds that for every T > 0, as 
n — 7- oo, /c — 7- oo and /c/n — t- 0, 

IP 

sup \ln{xi,X2) -l{xi,X2)\^0. 
(xi,X2)G[0,T]2 

The proof translates straightforwardly to general dimension d, and together 
with integrability of g yields consistency of J gin for J gl = f{0o)- For the 
proof of Theorem 4.1, a technical result is needed. 

Let 7ik,n{(^) G K^^^ denote the Hessian matrix of Qk,n as a function of 9. 
Let 7i{6) be the deterministic, symmetric p xj) matrix whose (i, j)th element, 
i,j G {1, . . . ,p}, is equal to 

d ,„.V / d 



Q2 



ip{e) \ {ip{eo)-m)- 



Lemma 7.1. Ifk/n^O and if the assumptions of Theorem ^.^(ii) are 
satisfied, then as oo and k — )• oo, on some closed neighborhood of 9q, 

F 

(i) T~Lk,n{Q) -^T-L{0) uniformly in 9, and 

(ii) f{'Hk,n{9) is positive definite) — )■ 1. 



Proof, (i) The Hessian matrix of Qk,n in is a p x p matrix 'Hk,n{(^) 
ij = d'^Qk,nid)/dOj 



with elements (V.k,niG))ij = d'^Qk,niO)/d9j d9i, for i,j G {1, . . . ,p}, given by 



(^fc,nW)^7=2y" / grn{x)^l{x;9)dx- [ g^{x)^l{x-9)dx 
^^J[Q,lY ^^J -'[0,1]'* (^^i 



X / gm{x){lnix) - l{x;9))dx 

J [0,1]'* 



g{x)ln{x) dx - ip{9) 

[0,1]'' 
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The consistency of / gin for ip{6o) implies 



d0i dOj 



T 



Since we assumed that there exists eo > such that the set {9 £ @ : \\9 — 6o\\ < 
Eq} =: i?eg(^o) is closed and thus compact, and since ip is assumed to be 
twice continuously differ entiable, the second derivatives of ip are uniformly 
bounded on B^g(9o) and, hence, the convergence above is uniform on i3£„(^o)- 
(ii) For 9 = 9q we get 



that is, 



~\d9, 

n{9o) = 2ip{9ofip{9o 



=eo 



Since (p{9o) is assumed to be of full rank, T-L{9q) is positive definite. For 9 
close to '^(0) is also positive definite. Due to the uniform convergence of 
'Hk,n{d) to ^.{9) on i?£(j(0o), the matrix 'Hk,n{G) is also positive definite on 
Beoi^o) with probability tending to one. □ 

Proof of Theorem 4.1. (i) Fix e > such that < e < eo- Since cp is 
a homeomorphism, there exists 6 > such that 9 £ Q and — V9(0o)|| ^ ^ 

implies \\9 — 9o\\ < e. In other words, for every 9 £@ such that \\9 — 9q\\ > e, 
we have 11^(9) — 97(^0) II > ^- Hence, on the event 



gir. 



<5/2 



for every 9 £ Q with ||^ — ^o|| > £1 necessarily, 

m- J gin >MO)-vm\- 

>5-5/2 = S/2> 
As a consequence, on the event An, we have 



¥'(^0) - / gL 



gin 



inf 



Oo\\>e 



^(9) - / gl 



> 



mm 



m - / gl 
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where we can write the minimum on the right-hand side since the set {6 € 
: 11^ — ^oll ^ is closed and thus compact for < e < eo- Hence, on the 
event An, the "argmin" set G,„ is nonempty and is contained in the closed 
ball of radius e centered at ^o- Finally, P(A„) — )• 1 by weak consistency of 
/ gin for f gl = 93(6*0). 

(ii) In the proof of (i) we have seen that, with probability tending to 
one, the proposed M-estimator exists and it is contained in a closed ball 
around ^o- In Lemma 7.1 we have shown that the criterion function is, with 
probability tending to one, strictly convex on such a closed ball around 
and, hence, with probability tending to one, the minimizer of the criterion 
function is unique. □ 

For i = 1, . . . ,n let 

U, := {Uii,..., U,d) := (1 - i^i(Xii), . . . , 1 - Fd{X,d)) 

and denote 

Qnj{Uj) := U^nuj] : n,ji j ~ • • • i d, 

Snji^j) '■— '^Qnj ^ ^ ^ ' j ~ ^1 ■ ■ ■ ■> d, 

Sn{x) := (S'„l(xi), . . . , Snd{xd)), 

where Ui-n,j< - ■ - I^Un: n,j are the order statistics of C/ij , . . . , Unj , j = 1 , . . . , d, 
and \a\ is the smallest integer not smaller than a. Write 

Vn{x) := ^fc/n < ^ or . . . or f/i, < ^ 
k \ n n 

Tn{x) := - y"l<^ Uii < or ... or Uid< >, 

k ^-^ \ n n \ 

i=l ^ ^ 

1 " f /c k ^ 

Ln{x) : = - ^ llUii < -Sni{xi) or ... or Uid < -Snd{xd) \ 
k ^-^ In n I 

i=l 
1 " 

= - ^ > n + 1 - kxx or ... or ijf > n + 1 - kxd\ 

4 = 1 

and note that 

L„(x) = Tn(Sn(x)). 

With probability one, for every x and for every j S {l,...,(i}, there is at 
most one i such that n + ^ — kxj < R'f <n + 1 — kxj. Hence, 

(7.1) sup Vk\ln{x) - Ln{x)\<-^^0. 

xe[o,i]d Vk 
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This shows that the asymptotic properties of In and L„ are the same. With 
the notation Vn{x) = \/k{Tn{x) — Vn{x)), we have the fohowing result. 

Proposition 7.2. Let T > and denote := {u G [0, 00]*^ ■.ui<xi or 
■ ■ ■ or Ud< Xd}- There exists a sequence of processes Vn such that, for all n, 

Vn = Vn and there exists a Wiener process Wi{x) := W\{Ax) such that as 
n — )• 00, 



The result follows from Theorem 3.1 in Einmahl (1997). From the proofs 
there it follows that a single Wiener process, instead of the sequence in 
the original statement of the theorem, can be used, and that convergence 
holds almost surely, instead of in probability, once the Skorohod construction 
is introduced. From now on, we work on this new (Skorohod) probability 
space, but keep the old notation, without the tildes. In particular, we have 
convergence of the marginal processes: 

sup \vnj{x) -Wi^j{xj)\^Q a.s., j = l,...,d, 

where Vnjixj) := fri((0, . . . , 0, Xj, 0, . . . , 0)). The Vervaat (1972) lemma im- 
plies 

(7.3) sup \Vk{Snjixj)-Xj) + Wij{xj)\^0 a.s., j = l,...,d. 

Xj&[0,2T] 

Proof of Theorem 4.6. Write 



Vk{Lnix) - 1{X)) 

= Vk{TniSnix)) - Vn{Sn{x))) + V^(14(5„(x)) - l{Sn{x))) 
+ Vk{l{Sn{x))-l{x)) 

= :Di{x) + D2{x) + D3ix). 



sup \Di{x)-Wi{x)\ 

xe[o,T]'' 

< sup \Di{x)-Wi{Sn{x))\ 

xelO,T]'' 

+ sup \Wl{Sn{x))-Wl{x)\. 

xelo,T]'i 



(7.2) 



sup \Vnix) - Wi{x)\ ^ 0. 
xe[0,2T]'i 




Vn{Sn{x)). 



It holds that 
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Because of (7.3), this is, with probabiHty tending to one, less than or equal 
to 

sup \vn{y)-Wi{y)\+ sup \Wi{Sn{x))-Wi{x)\. 

j/G[0,2T]'* a:e[0,T]'* 

Both terms tend to zero in probability, the first one by Proposition 7.2, the 

second one because of the uniform continuity of Wi and (7.3). 

IP 

Proof of sup^gjQ y]d |L'2(2;)| — ^ 0. Because of (7.3), with probability tend- 
ing to one, sup^g[o,T]'' |-C'2(3;)| is less than or equal to supygp ^T]^ Vk\Vn{y) — 
^(y)|, which in turn, because of conditions (CI) and (C2), is equal to 

^K(^)>o((;^)"">°<^'- 

I IP 
Proof of sup^gjo^T]'* 1-^^3(2^) + Y^j=i ~^ 0- to the ex- 

istence of the first derivatives, we can use the mean value theorem to write 

1 

— D^{x) = l{Sn{x)) - l{x) = "^(SnjiXj) - Xj) ■ lj{^n) 

with between x and Sn{x). Therefore, 



sup 

i;e[0,T]d 



d 



D^{x) + Y,h{x)Wi,,{xj) 



^ X^l^j(Cn)VA;(S'„j(xj) - Xj) + lj{x)WLj{xj)\. 

Note that all the terms on the right-hand side of the above inequality can 
be dealt with in the same way. Therefore, we consider only the first term. 
For 6 G (0,T), this term is bounded by 

sup \h{in)\- sup \\/k{Snl{xi) - Xi) + W(i^i){xi)\ 
x<^[0,T]'i a:ie[0,r] 

+ sup \h{in) - h{x)\ ■ sup |VF(;^i)(xi)| 

a;6[<5,T]x[0,T]<*-i a;iG[0,T] 

+ sup |/i(^„) -/i(x)| • sup |VF(;^i)(xi)| 

a;6[0,(5]x[0,T]''-i a;iG[0,5] 

=:Di-D^ + DQ-DT + Ds-Dg. 

Observe that < /i < 1. Also, since h is continuous on [5/2, T] x [0,T]''~^, 

p 

it is uniformly continuous on that region. We have D5 — )■ by (7.3), so 
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D4 ■ D5 — )■ 0. The uniform continuity of li and the fact that almost surely 

P 

Dy < 00 yield Dq ■ — )■ 0. Finally, for every e > 0, we can find a 5 such 
that, with probability at least 1 — £, Dg < e and, hence, ■ Dg < e. 
Applying (7.1) completes the proof. □ 

Proposition 7.3. If conditions (CI) and (C2) from Theorem 4-2 hold, 
then as 00, 



(7.4) 



Vk g{x){ln{x)-l{x))dx^B. 



Proof. Throughout the proof we write l{x) instead of l{x;6o). Also, 
since I does not need to be differentiable, we will use notation lj{x), j = 
1, . . . ,d, to denote the right-hand partial derivatives here. Let Di{x), D2(x), 
1^3 (x) be as in the proof of Theorem 4.6 and take T = 1. Then 



Vk 



g{x)Ln{x)dx— / g{x)l{x)dx 
[0,1]'' ^[0,1]'' 



B 



< sup \Di{x) — Wi{x)\ / |(7(x)|dx-|- sup |D2(2;)| / |5'(a;)|dx 

a;G[0,l]'' -'[0,1]'' xG[0,l]'' -'[0,1]'' 



[0,1]' 



\9ix,y)\ 



Dsix) 



i=i 



lj{x)Wij{xj) 



dx. 



The first two terms on the right-hand side converge to zero in probabil- 
ity due to integrability of g and uniform convergence of -Di(x) and D2{x), 
which was shown in the proof of Theorem 4.6. The third term needs to be 
treated separately, as the condition on continuity (and existence) of partial 
derivatives is no longer assumed to hold. 

Let c<; be a point in the Skorohod probability space introduced before the 
proof of Theorem 4.6 such that for all j = 1, . . . , d, 

sup \Wi J {x j)\ < +00 and sup \Vk{Snj{xj) 



For such oj we will show by means of dominated convergence that 



0. 



(7.5) 



[0,1]'' 



\9[x) 



Vk{l{Snix))-lix)) + ^l,{x)Wl 



i=i 



dx^O. 



Proof of the pointwise convergence. If / is differentiable, con- 
vergence of the above integrand to zero follows from the definition of partial 
derivatives and (7.3). Since this might fail only on a set of Lebesgue measure 
zero, the convergence of the integrand to zero holds almost everywhere on 
[0,1]'^. 
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Proof of the domination. Note that from expressions for (one-sided) 
partial derivatives (2.7), and the moment conditions (2.3), it follows that 
< lj{x) < 1, for ah X E [0, 1]"^ and all j = 1, . . . , d. 

We get 



\9ix)\ 



Vk{l{Snix)) - lix))+Y,hi^WlAx,) 



< \g{x)\ ■ (Vk\l{Sn{x)) - l{x)\ + J2mA^j)\j ■ 

Using the definition of function / and uniformity of 1 — Fj (Xij), we have, 
for all j = 1 , . . . , d, 

I /(xi , . . . , Xj — l, Xj , Xj^l , . . . , X(l) Z (xi , . . . , Xj — l, Xj , Xj^l , . . . , X(l) I ^ \Xj Xj I . 

Hence, we can write 

sup 

xe[o,i]'^ 

< sup Vk\l{Snix)) - l{xi,Sn2{x2),- ■ ■ ,Sndixd))\ 
+ sup Vk\l{xi,Sn2{x2),Sn3ix3),- ■ ■ ,Snd{Xd)) 

xe[o,i]'' 

- l{xi,X2,Sn3{x3),...,Snd{xd))\ H 

+ sup Vk\l{xi,...,Xd-i,Snd{xd)) -l{x)\ 



< sup Vk\Snj{Xj) — Xj\ = 0(1). 

Since for all j = 1, . . . , d we have sup^^g^ y|M/^ij(xj)| < +00, the proof of (7.5) 

is complete. This, together with (7.1), finishes the proof of the proposition. 
□ 

Let VQk,n{0) G W^'^ be the gradient vector of Qk,n at 0. Put 
V{e) := 4(^(0)^S(0)(^(0) G 

Lemma 7.4. If the assumptions of Theorem are satisfied, then as 
n — )• 00, 



^^VQfc,„(0o)^A^(O,m))- 
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Proof. The gradient vector of Qk,n with respect to 6 in is 



vQfc,„(0o) = ( ^QkA^) 



T 



where for i = 1, . . . ,p, 

d 



e=eo 



m=l 



d 

[0,1]'* 



dx 



=00 



[0,1]" 



gm{x){ln{x) - l{x; 9o)) dx. 



Using vector notation, we obtain 



[0,1]" 



g{x){in{x) - /(a;;6'o))dx. 



Equation (7.1) and the proof of Proposition 7.3 imply that 



Vk^Qk,n{0o 



-m&o) 



[0,1]" 



{x)Vk{ln{x) - l{x; 6o)) dx A -2ip{eo)'^B. 



The hmit distribution of Vk'\7Qk,n{do) is therefore zero- mean Gaussian with 
covariance matrix ^(^o) = 49i(6'o)^S(^o)vi(^o)- D 

Proof of Theorem 4.2. Consider the function f{t) := VQk,niSo + 
t{6n — Oq)), t G [0, 1]. The mean value theorem yields 

VQfc,„(0"„) = VQk,n{eo)+'Hk,n{On){en " ^o) 

for some Ofi between Oq and O^i. First note that, with probability tending 
to one, = V(5fc,n(^n)) which follows from the fact that On is a minimizer 
of Qk,n and that, with probability tending to one, 6n is in an open ball 

around Oq. By the consistency of 9n, we have that 9n ^ Oq, and since the 
convergence of 'Hk,n to V. is uniform on a neighborhood of we get that 

nk,n{On)^n{eo). Hence, - ^o) ^ iV(0, M(0o)). □ 

Proof of Corollary 4.3. As in Lemma 7.2 in Einmahl, Krajina and 
Segers (2008), we can see that if ^ i— > Hg is weakly continuous at then 
9 I— ?• Ti(9) is continuous at ^o- This, together with the assumption that ip 
is twice continuously differentiable and ^{9o) is of full rank, yields that 
9 I—)- V{9) is continuous at ^o- The above assumption also implies that 9 i— )• 
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T-L{0) is continuous at 9q, which, with the positive definiteness of T-L{0) in 
a neighborhood of 6*0, shows that \i 9 ^ Hg is weakly continuous at then 
e ^ M{e) = n{e)-^V{e)n{e)-^ is continuous at 6*0. Hence, we obtain 

M(4)~'/'^(4 - ^o) ^ N{Q, Ip), 
which yields (4.3). □ 

Proof of Theorem 4.4. Theorem 4.2 and the arguments used in the 
proof of Corollary 4.3 imply that, as n — )■ oo, 

(7.6) M-^/\ei,e*2)Vk{e2 - ei) A iv(o,/,) 

and hence (4.10). □ 

Proof of Lemma 6.1. We have 

l{xi, ...,Xd)= lim tP(l - Fi(Xi) < xi/t or ... or 1 - Fd{Xd) < xa/t) 
= lim tP(l - Fy, (Yi) < xi/t or ... or 1 - Fy.iYd) < xa/t) 

- lim tpfyi > i&i^ or ... or Yd > *^^=i ""'^ 



i-s-oo y Xi Xd 

I af-Xj 

H<j<dl<i<r ^ «J 



lnn.F(U UK^^^^}) 
lim tP( 11 \ Wi> mm 
lim t Vpf > min ^^fi^i^^ 
hm > r 1 — exp< — max — 



= max <j '-^ ^ \ =: max {feijXj} 

as required. □ 

Proof of Lemma 6.2. Fix i G {1, . . . ,r}. We have 

Write the integral as a double integral, the outer integral with respect to 
Xj G [0, 1] and the inner integral with respect to X-j = {xi)i^j G W^~^ over 
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the relevant domain. We find 



j x% max {ftjjXj} dx = N / hijXj I x%dx-j6.Xj. 



After some long, but elementary computations, this simplifies to the stated 
expression. □ 



Acknowledgments. We are grateful to Axel Biicher for pointing out that 
the original condition (C3) of Theorem 4.6 was too restrictive. We also like 
to thank the Associate Editor and two referees for a thorough reading of the 
manuscript and for many thoughtful comments that led to this improved 
version. 



REFERENCES 

Ballani, F. and Schlather, M. (2011). A construction principle for multivariate extreme 
value distributions. Biometrika 98 633-645. MR2836411 

Beirlant, J., GOEGEBEUR, Y., Teugels, J. and Segers, J. (2004). Statistics of Ex- 
tremes: Theory and Applications. Wiley, Chichester. MR2108013 

BOLDI, M. O. and Davison, A. C. (2007). A mixture model for multivariate extremes. 
J. R. Stat. Soc. Ser. B Stat. Methodol. 69 217-229. MR2325273 

Coles, S. G. and Tawn, J. A. (1991). Modelling extreme multivariate events. J. R. Stat. 
Soc. Ser. B Stat. Methodol. 53 377-392. MR1108334 

COOLEY, D., Davis, R. A. and Naveau, P. (2010). The pairwise beta distribution: A flex- 
ible parametric multivariate model for extremes. J. Multivariate Anal. 101 2103-2117. 
MR2671204 

de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer, 

New York. MR2234156 
de Haan, L., Neves, C. and Peng, L. (2008). Parametric tail copula estimation and 

model testing. J. Multivariate Anal. 99 1260-1275. MR2419346 
de Haan, L. and Resnick, S. 1. (1977). Limit theory for multivariate sample extremes. 

Z. Wahrsch. Verw. Gehiete 40 317-337. MR0478290 
Drees, H. and Huang, X. (1998). Best attainable rates of convergence for estimators of 

the stable tail dependence function. J. Multivariate Anal. 64 25-47. MR1619974 
EiNMAHL, J. H. J. (1997). Poisson and Gaussian approximation of weighted local empirical 

processes. Stochastic Process. Appl. 70 31-58. MR1472958 
EiNMAHL, J. H. J., Krajina, A. and Segers, J. (2008). A method of moments estimator 

of tafl dependence. Bernoulli 14 1003-1026. MR2543584 
Embrechts, p., Kluppelberg, C. and Mikosch, T. (1997). Modelling Extremal Events 

for Insurance and Finance. Applications of Mathematics (New York) 33. Springer, 

Berlin. MR1458613 

Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks 
and bonds. Journal of Financial Economics 33 3-56. 

Fama, E. F. and French, K. R. (1996). Multifactor explanations of asset pricing anoma- 
lies. J. Finance 51 55-84. 

Frees, E. W. and Valdez, E. A. (1998). Understanding relationships using copulas. 
N. Am. Actuar. J. 2 1-25. MR1988432 



30 



J. H. J. EINMAHL, A. KRAJINA AND J. SEGERS 



Geluk, J. L., DE Haan, L. and de Vries, C. G. (2007). Weak and strong financial 

fragility. Technical Report 2007-023/2, Tinbergen Institute. 
Guillotte, S., Perron, F. and Segers, J. (2011). Non-parametric Bayesian inference 

on bivariate extremes. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 377-406. MR2815781 
GuMBEL, E. J. (1960). Bivariate exponential distributions. J. Amer. Statist. Assoc. 55 

698-707. MROl 16403 

Huang, X. (1992). Statistics of bivariate extreme values. Ph.D. thesis, Tinbergen Institute 
Research Series. 

Joe, H., Smith, R. L. and Weissman, I. (1992). Bivariate threshold methods for ex- 
tremes. J. R. Stat. Soc. Ser. B Stat. Methodol. 54 171-183. MR1157718 

Kleibergen, F. (2011). Reality checks for and of factor pricing. Technical report. Dept. 
Economics, Brown Univ. Preprint. Available at http://www.econ.brown.edu/fac/ 
Frank _Kleibergen/. 

Ledford, A. W. and Tawn, J. A. (1996). Statistics for near independence in multivariate 

extreme values. Biomctrika 83 169-187. MR1399163 
Ledford, A. W. and Tawn, J. A. (1998). Concomitant tail behaviour for extremes. Adv. 

m Appl. Probab. 30 197-215. MR1618837 
Malevergne, Y. and Sornette, D. (2004). Tail dependence of factor models. Journal 

of Risk 6 71-116. 

Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York. 
MR0762984 

Resnick, S. I. (1987). Extreme Values, Regular Variation, and Point Processes. Applied 
Probability. A Series of the Applied Probability Trust 4. Springer, New York. MR0900810 

RiBATET, M. (2011). POT: Generalized Pareto distribution and peaks over threshold. 
R package Version 1.1-1. 

Smith, R. L. (1994). Multivariate threshold methods. In Extreme Value Theory and Ap- 
plications (J. Galambos, J. Lechner and E. Simiu, eds.) 225-248. Kluwer Academic, 
Dordrecht. 

Tawn, J. A. (1988). Bivariate extreme value theory: Models and estimation. Biometrika 

75 397-415. MR0967580 
Vervaat, W. (1972). Functional central limit theorems for processes with positive drift 

and their inverses. Z. Wahrsch. Verw. Gebiete 23 245-253. MR0321164 
Wang, Y. and Stoev, S. A. (2011). Conditional sampling for max-stable random fields. 

Adv. in Appl. Probab. 43 463-481. 

J. H. J. EiNMAHL A. KrA,TINA 

Department of Econometrics and OR Institute for Mathematical Stochastics 

AND Center University of Gottingen, Gottingen 

TiLBURG University Germany 

PO Box 90153 E-MAIL: Andrea.Krajina@mathomatik.uni-gocttingon.de 

5000 LE TiLBURG 

The Netherlands 

E-MAIL: j.h.j.einmahl@uvt.nl 

,J. Segers 
ISBA 

Universite Catholique de Louvain 
VoiE Du Roman Pays, 20 
B-1348 Louvain-la-Neuve 
Belgium 

E-MAIL: johan.segers@uclouvain.be 



